Flakify: A Black-Box, Language Model-Based Predictor for Flaky Tests

نویسندگان

چکیده

Software testing assures that code changes do not adversely affect existing functionality. However, a test case can be flaky, i.e., passing and failing across executions, even for the same version of source code. Flaky cases introduce overhead to software development as they lead unnecessary attempts debug production or Besides rerunning multiple times, which is time-consuming computationally expensive, flaky predicted using machine learning (ML) models, thus reducing wasted cost re-running debugging these cases. state-of-the-art ML-based predictors rely on pre-defined sets features are either project-specific, inapplicable other projects, require access code, always available engineers. Moreover, given non-deterministic behavior cases, it challenging determine complete set could potentially associated with flakiness. Therefore, in this article, we propose Flakify, black-box, language model-based predictor Flakify relies exclusively requiring (a) (black-box), (b) rerun (c) pre-define features. To end, employed CodeBERT, pre-trained model, fine-tuned predict We evaluated two publicly datasets (FlakeFlagger IDoFT) compared our technique FlakeFlagger approach, best ML-based, white-box different evaluation procedures: (1) cross-validation (2) per-project validation, prediction new projects. achieved F1-scores 79% 73% dataset respectively. Similarly, 98% 89% IDoFT validation procedures, Further, surpassed by 10 18 percentage points (pp) terms precision recall, respectively, when dataset, bound unnecessarily percentages (corresponding reduction rates 25% 64%). also significantly higher results used suggesting better generalizability over FlakeFlagger. Our further show black-box viable option predicting

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DeFlaker: Automatically Detecting Flaky Tests

Developers often run tests to check that their latest changes to a code repository did not break any previously working functionality. Ideally, any new test failures would indicate regressions caused by the latest changes. However, some test failuresmay not be due to the latest changes but due to non-determinism in the tests, popularly called flaky tests. The typical way to detect flaky tests i...

متن کامل

A Formal Language and Analysis Tool for Black Box Specifications

The black box specification, developed by Harlan Mills, addresses the problem of software errors that result from failing to properly specify a response for an input scenario. Each black box models how an artifact responds to a particular input from its environment. This response depends on both the current input and the entire history of interactions it has had with the environment. We have ob...

متن کامل

PMU-Based Matching Pursuit Method for Black-Box Modeling of Synchronous Generator

This paper presents the application of the matching pursuit method to model synchronous generator. This method is useful for online analysis. In the proposed method, the field voltage is considered as input signal, while the terminal voltage and active power of the generator are output signals. Usually, the difference equation with a second degree polynomial structure is used to estimate the co...

متن کامل

Surrogate-based methods for black-box optimization

In this paper, we survey methods that are currently used in black-box optimization, i.e. the kind of problems whose objective functions are very expensive to evaluate and no analytical or derivative information are available. We concentrate on a particular family of methods, in which surrogate (or meta) models are iteratively constructed and used to search for global solutions.

متن کامل

developing a pattern based on speech acts and language functions for developing materials for the course “ the study of islamic texts translation”

هدف پژوهش حاضر ارائه ی الگویی بر اساس کنش گفتار و کارکرد زبان برای تدوین مطالب درس "بررسی آثار ترجمه شده ی اسلامی" می باشد. در الگوی جدید، جهت تدوین مطالب بهتر و جذاب تر، بر خلاف کتاب-های موجود، از مدل های سطوح گفتارِ آستین (1962)، گروه بندی عملکردهای گفتارِ سرل (1976) و کارکرد زبانیِ هالیدی (1978) بهره جسته شده است. برای این منظور، 57 آیه ی شریفه، به صورت تصادفی از بخش-های مختلف قرآن انتخاب گردید...

15 صفحه اول

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Software Engineering

سال: 2023

ISSN: ['0098-5589', '1939-3520', '2326-3881']

DOI: https://doi.org/10.1109/tse.2022.3201209